Analysis of slipped sequences in ESTs Projects

نویسندگان

  • Christian Baudet
  • Zanoni Dias
چکیده

Slippage is an important sequencing problem that can occur in EST projects. However, there are very few studies about it. In this work we propose three new methods to detect slippage artifacts: “Arithmetic Mean Method”, “Geometric Mean Method”, and “Echo Coverage Method”. Each method is simple and has two different strategies for processing sequences: suffix and subsequence. Using the 291689 EST sequences produced in the SUCEST project [9], we performed comparative tests between the proposed methods and Telles and Silva Method [8]. The subsequence strategy is better than the suffix strategy because it is not anchored at the end of the sequence, so it is more flexible to find slippage at the beginning of the EST. Comparing with the Telles and Silva Method, the advantage of our methods is that they do not discard the majority of the sequences marked as slippage, but, instead of it, only remove the slipped artifact from the sequence. The tests indicate that the “Echo Coverage Method” with subsequence strategy has the best compromise between slippage detection and calibration easiness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P-215: Discovery of A Novel APA Variant of A Human Potential Gene Based on Expressed Sequenced Tags Analysis

Background: Expressed sequence tags (ESTs) are sequences of cDNA fragments prepared from different tissue sources. There are over one million of these sequences in the publicly available database, and these sequences are believed to represent more than half of all human genes. The ESTs belong to different cDNA libraries, was prepared from one particular cell type, organ, or tumor. Therefore, th...

متن کامل

Expressed Sequence Tags as a Tool for Phylogenetic Analysis of Placental Mammal Evolution

BACKGROUND We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human), lagomorphs (rabbit), rodents (rat and mouse), artiodactyls (cow), carnivorans (dog) and proboscideans (elephant). METHODOLOGY/PRINCIPAL FINDINGS We have produced 2000 E...

متن کامل

Computational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)

Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...

متن کامل

Analysis of slipped sequences in EST projects.

Slippage is an important sequencing problem that can occur in EST projects. However, very few studies have addressed this. We propose three new methods to detect slippage artifacts: arithmetic mean method, geometric mean method, and echo coverage method. Each method is simple and has two different strategies for processing sequences: suffix and subsequence. Using the 291,689 EST sequences produ...

متن کامل

Implementation and testing of an automated EST processing and similarity analysis system

Expressed sequence tag (EST) sequencing projects are being undertaken in an effort to identify the function of as many genes as possible fmm entire genomes. Putative function can be determined by analyzing the similarity of the ESTs to sequences in the public databases. We are involved in a long-term project to research and develop database technology to store and analyze ESTs for Arabidopsis t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008